Use simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.
Estimate the slope and intercept of the regression line using the least squares method.
Interpret the slope and intercept of the regression line.
The library() function is used to access functionality
that is provided by R packages, but is not included in base R.
install.packages() can be used to install new packages. Run
this command from the console.
# install.packages("tidyverse")
First, load the package tidyverse that will be used
throughout the tutorial for data visualizations.
library(tidyverse)
load(file='nhanes1518.rda')
The functions head() and names() can be
used to explore the data. head() can output the first
several rows of the data, while `names() can provide all
the names of variables.
head(nhanes1518)
## # A tibble: 6 × 52
## SEQN WTINT2YR WTMEC…¹ RIDST…² RIAGE…³ RIDAG…⁴ INDFM…⁵ RIDRE…⁶ DMDED…⁷ DMDED…⁸
## <int> <dbl> <dbl> <int> <int> <int> <dbl> <int> <int> <int>
## 1 83732 134671. 135630. 2 1 62 4.39 3 5 NA
## 2 83733 24329. 25282. 2 1 53 1.32 3 3 NA
## 3 83734 12400. 12576. 2 1 78 1.51 3 3 NA
## 4 83735 102718. 102079. 2 2 56 5 3 5 NA
## 5 83736 17628. 18235. 2 2 42 1.23 4 4 NA
## 6 83737 11252. 10879. 2 2 72 2.82 1 2 NA
## # … with 42 more variables: INDHHIN2 <int>, BMXBMI <dbl>, BMXWAIST <dbl>,
## # BMXWT <dbl>, BMXHT <dbl>, URXUCR <dbl>, WTSB2YR <dbl>, URXCNP <dbl>,
## # URDCNPLC <int>, URXCOP <dbl>, URDCOPLC <int>, URXECP <dbl>, URDECPLC <int>,
## # URXHIBP <dbl>, URDHIBLC <int>, URXMBP <dbl>, URDMBPLC <int>, URXMC1 <dbl>,
## # URDMC1LC <int>, URXMCOH <dbl>, URDMCOLC <int>, URXMEP <dbl>,
## # URDMEPLC <int>, URXMHBP <dbl>, URDMHBLC <int>, URXMHH <dbl>,
## # URDMHHLC <int>, URXMHNC <dbl>, URDMCHLC <int>, URXMHP <dbl>, …
names(nhanes1518)
## [1] "SEQN" "WTINT2YR" "WTMEC2YR" "RIDSTATR" "RIAGENDR" "RIDAGEYR"
## [7] "INDFMPIR" "RIDRETH1" "DMDEDUC2" "DMDEDUC3" "INDHHIN2" "BMXBMI"
## [13] "BMXWAIST" "BMXWT" "BMXHT" "URXUCR" "WTSB2YR" "URXCNP"
## [19] "URDCNPLC" "URXCOP" "URDCOPLC" "URXECP" "URDECPLC" "URXHIBP"
## [25] "URDHIBLC" "URXMBP" "URDMBPLC" "URXMC1" "URDMC1LC" "URXMCOH"
## [31] "URDMCOLC" "URXMEP" "URDMEPLC" "URXMHBP" "URDMHBLC" "URXMHH"
## [37] "URDMHHLC" "URXMHNC" "URDMCHLC" "URXMHP" "URDMHPLC" "URXMIB"
## [43] "URDMIBLC" "URXMNP" "URDMNPLC" "URXMOH" "URDMOHLC" "URXMZP"
## [49] "URDMZPLC" "WTINT4YR" "WTMEC4YR" "WTSB4YR"
In this chapter, we introduce linear regression which aims to model the relationship between a response variable and one or more predictor variables by fitting a linear equation to observed data. The goal of linear regression is to find the best-fitting line (or hyperplane) that minimizes the sum of the squared differences between the observed responses and the values predicted by the equation. The resulting equation can then be used to make predictions about the response for new inputs. In essence, linear regression aims to answer the question of how changes in the independent variables relate to changes in the dependent variable.
We will illustrate investigating the relationship between age and BMI as an example of linear regression.
We are interested in the relation between age and BMI, and are especially interested in adults. Before applying data analysis, we want to explore the distribution of these variables:
nhanes1518 <- nhanes1518%>%filter(RIDAGEYR>=18)
This tutorial will be using the nhanes dataset where the
variables are described in the file nhanes-codebook.txt.
Load this data with the load function and specify the rda
data file.
# Basic histogram
ggplot(nhanes1518, aes(x=RIDAGEYR)) + geom_histogram()+labs(x = "waist circumference", title = "Distribution of Age")
# Change the width of bins
ggplot(nhanes1518, aes(x=RIDAGEYR)) +
geom_histogram(binwidth=1)+labs(x = "Age", title = "Distribution of Age")
# Change colors
p<-ggplot(nhanes1518, aes(x=RIDAGEYR)) +
geom_histogram(color="black", fill="white")+labs(x = "Age", title = "Distribution of Age")
We are interested in the relation between age and BMI. Before applying data analysis, we want to check the distributions of these variables.
We see that the distribution of waist circumference is asymmetric, with peaks at age of around 80 and 0 respectively.
We can also add the mean line, as well as overlay with transparent density plot. The value of alpha controls the level of transparency:
# Add mean line
p+geom_vline(aes(xintercept=mean(RIDAGEYR)),
color="blue", linetype="dashed", size=1)
# Histogram with density plot
ggplot(nhanes1518, aes(x=RIDAGEYR)) +
geom_histogram(aes(y=..density..), colour="black", fill="white")+
geom_density(alpha=.2, fill="#FF6666") # add a layer of density
The assumptions of linear regression model are as follows:
We will verify that our data meets all the assumptions above in Section 5. Model Diagnostics.
Simple linear regressions take the form
\[Y_i = \beta_0 +\beta_1 X_i +\epsilon_i\] Where \(Y_i\) is the dependent variable, \(X_i\) is the independent variable and \(\epsilon_i\) is the random error term.
We’ll start with a fitting a simple linear model using the
lm() function. In the lm() function, the first
variable is the response variable and the variables to the right of the
~ symbol are the predictor variable(s). Here we use age as
the response, and BMI as the predictor variables.
lm.fit <- lm(RIDAGEYR ~ BMXBMI, data = nhanes1518)
There are several ways that we can examine the model results. The
summary() function gives a more extensive overview of the
model fit:
summary(lm.fit)
##
## Call:
## lm(formula = RIDAGEYR ~ BMXBMI, data = nhanes1518)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.786 -16.164 0.086 14.837 32.948
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.41461 0.73402 61.871 < 2e-16 ***
## BMXBMI 0.11531 0.02413 4.778 1.79e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.48 on 11094 degrees of freedom
## (752 observations deleted due to missingness)
## Multiple R-squared: 0.002053, Adjusted R-squared: 0.001963
## F-statistic: 22.83 on 1 and 11094 DF, p-value: 1.795e-06
\(\beta_0\): Not interpretable in our case. For a person with 0 BMI (which is unrealistic), his or her expected age is -2.241 (unrealistic again). \(\beta_1\): For every unit increase in the BMI of a person, his or her age is expected to increase by 1.432 on average.
p value: The p value tells us how likely the data we have observed is to have occurred under the null hypothesis (more material on Null hypothesis on subsequent tutorials), i.e. that there is no correlation between the predictor variable age and the response BMI. From the model above, we have a p value of less than 2e-16, which tells us that the predictor variable age is statistically significant.
The coefficients of the linear regression model can be extracted
using the coef() function and the confidence interval(s)
with the confint() function.
coef(lm.fit)
## (Intercept) BMXBMI
## 45.4146068 0.1153059
confint(lm.fit)
## 2.5 % 97.5 %
## (Intercept) 43.97580422 46.8534094
## BMXBMI 0.06799999 0.1626119
We can use the predict() function to obtain prediction
intervals or confidence intervals for a given value of the predictor
variable, BMXWAIST. Note that when using the predict
function, the column names and format of the new points at which to
predict needs to be the same as the original data frame used to fit the
lm() model. If you encounter errors using the
predict() function, this is a good first thing to
check.
predict(lm.fit, data.frame(BMXBMI = (c(70, 80, 90))), interval = "confidence")
## fit lwr upr
## 1 53.48602 51.54110 55.43094
## 2 54.63908 52.22710 57.05106
## 3 55.79214 52.91114 58.67314
predict(lm.fit, data.frame(BMXBMI = (c(70, 80, 90))), interval = "prediction")
## fit lwr upr
## 1 53.48602 17.21825 89.75379
## 2 54.63908 18.34327 90.93489
## 3 55.79214 19.46215 92.12213
Prediction and confidence interval are both statistical concepts that are used to estimate or quantify uncertainty in a particular outcome or parameter. However, they have different meanings and interpretations.
Confidence interval: a range of values that is likely to contain the true value of a parameter with a certain degree of confidence. For example, if you are estimating the mean age of a population based on a sample, a 95% confidence interval would provide a range of values within which the true population mean age is expected to lie with 95% confidence. The width of the confidence interval reflects the uncertainty in the estimation, with wider intervals indicating more uncertainty.
Prediction: An estimate of a specific value or outcome based on the statistical model. For example, in our case when we use a linear regression model to predict the age of a person based on their BMI, a prediction would be the estimated age for a person of a specific BMI Predictions are usually accompanied by a measure of how accurate the estimate is, such as the mean squared error or the root mean squared error. Unlike a confidence interval which provides a range of values for the parameter, a prediction interval provides a range of values for the actual observation or outcome.
We can plot the variables BMXBMI and
BMXWAIST using the plot() function and overlay
the regression line found using lm() with the
abline() function.
plot(x=nhanes1518$RIDAGEYR,y=nhanes1518$BMXBMI, xlab="Age",ylab="Body Mass Index (kg/m**2)")
abline(lm.fit)
Here the dollar sign specifies that we retrieve the predictor columns
BMXBMI and BMXWAIST from the dataset
nhanes1518
We can experiment with different options for abline() by
changing the line width and color in abline().
plot(x=nhanes1518$RIDAGEYR,y=nhanes1518$BMXBMI, xlab="Body Mass Index (kg/m**2)",ylab="Waist Circumference (cm)")
abline(lm.fit, lwd = 3, col = "red")
plot(x=nhanes1518$RIDAGEYR,y=nhanes1518$BMXBMI, xlab="Body Mass Index (kg/m**2)",ylab="Waist Circumference (cm)", col = "red")
The pch argument in plot() changes the
shape/type of the points that are plotted.
plot(x=nhanes1518$RIDAGEYR,y=nhanes1518$BMXBMI, xlab="Body Mass Index (kg/m**2)",ylab="Waist Circumference (cm)",pch = 20)
plot(x=nhanes1518$RIDAGEYR,y=nhanes1518$BMXBMI, xlab="Body Mass Index (kg/m**2)",ylab="Waist Circumference (cm)",pch = "+")
Optional: We can make a similar plot using ggplot, where
we fit the linear regression model using ggplot().
ggplot(nhanes1518, aes(x = RIDAGEYR, y = BMXBMI)) +
geom_smooth(method = "lm", formula = y ~ x, colour = "blue") +
geom_point() +
ggtitle("Age vs. BMI for the nhanes data")
The par() function can be used to create a grid of
multiple subplots.
par(mfrow = c(2, 2))
plot(lm.fit)
The diagnostic plots show residuals in four different ways:
Residuals vs Fitted. Used to check the linear relationship assumptions. A horizontal line, without distinct patterns is an indication for a linear relationship, what is good. The model we fitted shows roughly a linear relationship, with no distinct patterns (such as a fan or funnel shape) in the residuals vs. fitted plot.
Normal Q-Q. Used to examine whether the residuals are normally distributed. It’s good if residuals points follow the straight dashed line. The Q-Q plot generally follows the straight dashed line, with some deviations at the end towards high values of theoretical quantiles.
Scale-Location (or Spread-Location). Used to check the homogeneity of variance of the residuals (homoscedasticity). Horizontal line with equally spread points is a good indication of homoscedasticity.
Residuals vs Leverage. Used to identify influential cases, that is extreme values that might influence the regression results when included or excluded from the analysis. Based on the residuals vs. leverage plot, there are no influential points according to Cook’s distance. However, there might be some points with high standard residuals values which could be marked as outliers.
Some other metrics from the model:
We can use the residuals() and rstudent()
functions to extract the residuals and studentized residuals,
respectively, from the linear model and plot them along with the
predicted values.
plot(predict(lm.fit), residuals(lm.fit))
plot(predict(lm.fit), rstudent(lm.fit))
Additionally, we can compute the influence matrix for the predictors
using the hatvalues() function.
plot(hatvalues(lm.fit))
which.max(hatvalues(lm.fit))
## 8723
## 8196
The lm() function can also fit multiple regression
models. In this section, we will use RIDAGEYR,
BMXWAIST, BMXWT, and BMXHT as
predictors of the response variable BMXBMI.
Multiple linear regression allows to evaluate the relationship between two variables, while controlling for the effect (i.e., removing the effect) of other variables.
lm.fit <- lm(BMXBMI ~ RIDAGEYR+ BMXWAIST + BMXWT + BMXHT, data = nhanes1518)
summary(lm.fit)
##
## Call:
## lm(formula = BMXBMI ~ RIDAGEYR + BMXWAIST + BMXWT + BMXHT, data = nhanes1518)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1176 -0.3262 0.0503 0.3471 8.7914
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 57.4364793 0.2080287 276.1 < 2e-16 ***
## RIDAGEYR -0.0039875 0.0005317 -7.5 6.88e-14 ***
## BMXWAIST 0.0166524 0.0015277 10.9 < 2e-16 ***
## BMXWT 0.3443669 0.0012653 272.2 < 2e-16 ***
## BMXHT -0.3463345 0.0011206 -309.1 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8401 on 10529 degrees of freedom
## (1314 observations deleted due to missingness)
## Multiple R-squared: 0.986, Adjusted R-squared: 0.986
## F-statistic: 1.849e+05 on 4 and 10529 DF, p-value: < 2.2e-16
BMXWT: The coeffcient for the predictor
BMXWT is 0.2420969, which means that for every unit
increase in the participant’s waist circumference, the BMI is expected
to increase by 0.2420969 on average, holding all else constant (holding
all other predictor variables, BMXWAIST,
BMXWT, and BMXHT constant)# we can use select to filter the variables of interest
nhanes_core<-nhanes1518%>%select(BMXBMI, RIDAGEYR, BMXWAIST,BMXWT,BMXHT)
# In the lm() formula, a dot . can be used to include all variables in the NHANES data as predictors.
lm.fit1 <- lm(nhanes1518$BMXBMI ~ ., data = nhanes_core)
summary(lm.fit1)
##
## Call:
## lm(formula = nhanes1518$BMXBMI ~ ., data = nhanes_core)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1176 -0.3262 0.0503 0.3471 8.7914
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 57.4364793 0.2080287 276.1 < 2e-16 ***
## RIDAGEYR -0.0039875 0.0005317 -7.5 6.88e-14 ***
## BMXWAIST 0.0166524 0.0015277 10.9 < 2e-16 ***
## BMXWT 0.3443669 0.0012653 272.2 < 2e-16 ***
## BMXHT -0.3463345 0.0011206 -309.1 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8401 on 10529 degrees of freedom
## (1314 observations deleted due to missingness)
## Multiple R-squared: 0.986, Adjusted R-squared: 0.986
## F-statistic: 1.849e+05 on 4 and 10529 DF, p-value: < 2.2e-16
# If we want to exclude specific variables from the list of predictors, we can use the `-` notation. In the following example, all predictor variables but `age` are included in the model.
# Including `-1` excludes the intercept from the model.
lm.fit1 <- lm(nhanes1518$BMXBMI ~ .- 1, data = nhanes_core)
summary(lm.fit1)
##
## Call:
## lm(formula = nhanes1518$BMXBMI ~ . - 1, data = nhanes_core)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.0674 -1.4507 0.0126 1.5069 20.8889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## RIDAGEYR -0.0343057 0.0014932 -22.98 <2e-16 ***
## BMXWAIST 0.3174566 0.0030740 103.27 <2e-16 ***
## BMXWT 0.0900488 0.0024898 36.17 <2e-16 ***
## BMXHT -0.0483656 0.0008657 -55.87 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.412 on 10530 degrees of freedom
## (1314 observations deleted due to missingness)
## Multiple R-squared: 0.9937, Adjusted R-squared: 0.9937
## F-statistic: 4.127e+05 on 4 and 10530 DF, p-value: < 2.2e-16
Apart from diagnostics for simple linear regression, we also need to perform multicollinearity checks for multiple linear regression:
library(car)
# Calculate VIFs
vifs <- vif(lm.fit)
# Print VIFs
print(vifs)
## RIDAGEYR BMXWAIST BMXWT BMXHT
## 1.420864 10.107175 11.503433 1.893344
Variance Inflation Factor (VIF): The VIF is a measure of the increase in the variance of the estimated coefficients due to multicollinearity. A VIF value greater than 5 indicates that there is strong multicollinearity in the model. Here the multicollinearity issue is not significant.
nhanes_na_removed<-cbind(nhanes1518[1:5],nhanes1518$BMXBMI, nhanes1518$INDHHIN2)
nhanes_income <- dplyr::rename(nhanes_na_removed,income = "nhanes1518$INDHHIN2")%>% na.omit()
nhanes_income <- dplyr::rename(nhanes_income,BMI = "nhanes1518$BMXBMI")
# turn quantitative variable into categorical variable
nhanes_income$income<-as.character(nhanes_income$income)
Now we explore the effect of income on the response BMI. Income is
stored as 1, 2,.., 15,
77, 99, a categorical predictor variable, in
the dataset. The encoding of income categories can be found in this
website:
https://wwwn.cdc.gov/nchs/nhanes/2011-2012/demo_g.htm#INDHHIN2
We want to first drop categories with values 77 (Refused) and 99 (Don’t Know) first:
nhanes_income <- subset(nhanes_income, income!="77" & income!="99")
Then we fit the linear regression on categorical variable
income:
summary(lm(BMI ~ income, data = nhanes_income))
##
## Call:
## lm(formula = BMI ~ income, data = nhanes_income)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.053 -5.097 -1.153 3.796 55.708
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.08676 0.42827 67.917 < 2e-16 ***
## income10 0.25666 0.53138 0.483 0.62910
## income12 0.07855 0.56365 0.139 0.88917
## income13 1.15974 0.72021 1.610 0.10737
## income14 0.35724 0.48585 0.735 0.46218
## income15 -0.60605 0.46019 -1.317 0.18788
## income2 1.16654 0.57274 2.037 0.04170 *
## income3 0.65874 0.52391 1.257 0.20866
## income4 0.39053 0.51242 0.762 0.44599
## income5 0.61757 0.51184 1.207 0.22763
## income6 0.91220 0.47882 1.905 0.05680 .
## income7 0.86657 0.48244 1.796 0.07249 .
## income8 0.96502 0.49816 1.937 0.05275 .
## income9 1.40552 0.51288 2.740 0.00615 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.255 on 10176 degrees of freedom
## Multiple R-squared: 0.006989, Adjusted R-squared: 0.00572
## F-statistic: 5.509 on 13 and 10176 DF, p-value: 4.42e-10
Baseline: income category 1 corresponding to a household income of 0 to 4,999 dollars.
Model Interpretation:
Intercept: The intercept 25.6489 means that for people in the baseline income category (income category 1 corresponding to a household income of 0 to 4,999 dollars), the BMI is expected to be 25.6489 on average.
income6: The coeffcient for the predictor
income6 is 1.1473, which means that for participants with
household income category 6 (25,000 to 34,999 dollars per ear), the BMI
is expected to be 1.1473 higher than that of participants with household
income in category 1 (0 to 4,999 dollars), on average.
There are two ways to include interaction terms in the model,
: and *. The : symbol only
includes the interaction term between the two variables, while the
* symbol includes the variables themselves, as well as the
interaction terms. This means that BMXWT*BMXWAIST is
equivalent to BMXWT + BMXWAIST + BMXWT:BMXWAIST.
summary(lm(BMXBMI ~ BMXWT* BMXWAIST, data = nhanes1518))
##
## Call:
## lm(formula = BMXBMI ~ BMXWT * BMXWAIST, data = nhanes1518)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6955 -1.7808 -0.2051 1.5276 20.6194
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.379e+00 4.769e-01 2.892 0.00384 **
## BMXWT 4.045e-02 6.577e-03 6.150 8.01e-10 ***
## BMXWAIST 1.913e-01 5.180e-03 36.925 < 2e-16 ***
## BMXWT:BMXWAIST 6.650e-04 5.117e-05 12.997 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.71 on 10530 degrees of freedom
## (1314 observations deleted due to missingness)
## Multiple R-squared: 0.8539, Adjusted R-squared: 0.8538
## F-statistic: 2.051e+04 on 3 and 10530 DF, p-value: < 2.2e-16
# Fit the linear regression model
fit <- lm(BMXBMI ~ BMXWT * BMXWAIST, data = nhanes1518)
# Plot the interaction effects
ggplot(nhanes1518, aes(x = BMXWAIST, y = BMXBMI, color = BMXWT)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
scale_color_gradient(low = "blue", high = "red") +
labs(x = "BMXWAIST", y = "BMXBMI", color = "BMXWT") +
ggtitle("Interaction effects between BMXWT, BMXWAIST, and BMXBMI")
The plot shows the relationship between BMXBMI and BMXWAIST for
different levels of BMXWT, where each level is represented by a
different color. The plot provides a visual representation of how the
effect of BMXWAIST on BMXBMI differs for each level of BMXWT: For high
levels of BMXWT, we notice that BMXWAIST has the most
significant effect on BMXBMI, with the sharpest slope. Whereas, for low
levels of BMXWT, we notice that BMXWAIST has far less
significant effect on BMXBMI, with a very small slope at a stand
still.
A simple way to include all interaction terms is the syntax
.^2.
library(dplyr)
nhanes_na_removed<-cbind(nhanes1518[1:5],nhanes1518$BMXBMI)
# data clearning to ignore NA values
summary(lm(nhanes1518$BMXBMI ~ .^2, data = nhanes_na_removed))
##
## Call:
## lm(formula = nhanes1518$BMXBMI ~ .^2, data = nhanes_na_removed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.956e-13 -1.400e-16 1.900e-17 1.930e-16 5.786e-15
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.194e-15 2.375e-15 -2.187e+00 0.0288 *
## SEQN -1.453e-20 2.518e-20 -5.770e-01 0.5640
## WTINT2YR -1.138e-19 1.393e-19 -8.170e-01 0.4138
## WTMEC2YR 1.221e-19 1.344e-19 9.090e-01 0.3634
## RIDSTATR NA NA NA NA
## RIAGENDR -1.760e-15 9.395e-16 -1.873e+00 0.0611 .
## `nhanes1518$BMXBMI` 1.000e+00 6.442e-17 1.552e+16 <2e-16 ***
## SEQN:WTINT2YR 7.582e-25 1.433e-24 5.290e-01 0.5967
## SEQN:WTMEC2YR -8.528e-25 1.384e-24 -6.160e-01 0.5377
## SEQN:RIDSTATR NA NA NA NA
## SEQN:RIAGENDR 1.502e-20 9.817e-21 1.530e+00 0.1261
## SEQN:`nhanes1518$BMXBMI` -3.974e-22 6.755e-22 -5.880e-01 0.5564
## WTINT2YR:WTMEC2YR -4.070e-27 7.030e-27 -5.790e-01 0.5626
## WTINT2YR:RIDSTATR NA NA NA NA
## WTINT2YR:RIAGENDR -7.333e-21 1.484e-20 -4.940e-01 0.6212
## WTINT2YR:`nhanes1518$BMXBMI` 7.194e-22 1.182e-21 6.090e-01 0.5429
## WTMEC2YR:RIDSTATR NA NA NA NA
## WTMEC2YR:RIAGENDR 7.441e-21 1.429e-20 5.210e-01 0.6026
## WTMEC2YR:`nhanes1518$BMXBMI` -7.133e-22 1.145e-21 -6.230e-01 0.5332
## RIDSTATR:RIAGENDR NA NA NA NA
## RIDSTATR:`nhanes1518$BMXBMI` NA NA NA NA
## RIAGENDR:`nhanes1518$BMXBMI` 2.472e-18 7.725e-18 3.200e-01 0.7490
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.843e-15 on 11080 degrees of freedom
## (752 observations deleted due to missingness)
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 4.834e+33 on 15 and 11080 DF, p-value: < 2.2e-16